An Adaptive Descriptor Design for Object Recognition in the Wild 
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Abstract 

Digital images nowadays have various styles of appear- 
ance, in the aspects of color tones, contrast, vignetting, and 
etc. These 'picture styles ' are directly related to the scene 
radiance, image pipeline of the camera, and post processing 
functions. Due to the complexity and nonlinearity of these 
causes, popular gradient-based image descriptors won't 
be invariant to different picture styles, which will decline 
the performance of object recognition. Given that images 
shared online or created by individual users are taken with 
a wide range of devices and may be processed by various 
post processing functions, to find a robust object recognition 
system is useful and challenging. In this paper, we present 
the first study on the influence of picture styles for object 
recognition, and propose an adaptive approach based on 
the kernel view of gradient descriptors and multiple kernel 
learning, without estimating or specifying the styles of im- 
ages used in training and testing. We conduct experiments 
on Domain Adaptation data set and Oxford Flower data set. 
The experiments also include several variants of the flower 
data set by processing the images with popular photo ef- 
fects. The results demonstrate that our proposed method 
improve from standard descriptors in all cases. 

1. Introduction 

Digital images may vary in the aspects of color tones, 
contrast, clarity, vignetting, and etc.. We refer such charac- 
teristics of digital images as picture styles in general. With 
the popularity of photo editing and sharing services such as 
Instagram, Facebook and Flickr that are available on mo- 
bile devices, most of the digital images generated by users 
nowadays are captured with a wide range of devices (e.g., 
smart phones and digital sirs) and processed by different 
pixel editing functions (e.g., "lomo-fi" and "lord-kelvin" 
available in Instagram) to get distinct picture styles with 
strong personal artistic expressions. Recall that the goal 
of object recognition research is to recognize natural scenes 






(e) (f) 
Figure 1. We show 3 pairs of images with different picture styles 

about the same objects. The difference between (a) and (b) are 
mainly caused by different scene radiances (illumination condi- 
tion), (c) and (d) are of the same object and taken under the same 
condition by a digital SLR and a webcam respectively, which rep- 
resent two different image pipelines, (f) is an image obtained by 
applying Instagram™ /omo-^ effect filter to image (e), which is 
one kind of post processing. 

lfT3ll , daily objects |5 |, or fine-grained species III7II22I based 
on digital images, it is natural to extend the scope of object 
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Figure 2. In the upper left is an original image from Oxford Flower data set. In the lower left is the lomo-fi version of the image. We select 
two frames of the two images at the same location (red boxes), and show the pixel patch, gradient, and SIFT descriptor for each of them. 
We plot the difference between two descriptors in the right. 



recognition from standard laboratory images to photos in 
the wild for daily use. Although there are a large number of 
picture styles out there, the causes can be separated into 3 
categories: (1) scene radiance, (2) image pipeline, and (3) 
post processing. In Fig. [T]we show several pairs of images 
about the same objects, with different picture styles. 

To show the connection between image descriptors with 
the picture styles, we take an image from the Oxford Flower 
data set and process it with a popular Instagram effect fil- 
ter: lomo-fi. We select two patches at the same locations 
for these two images respectively, and compute the gradi- 
ents and SIFT descriptors based on the patches, which are 
shown in Fig. O Although these two image patches are al- 
most same except the color tones, we found that the result- 
ing SIFT descriptors differ with each other in 33%, which 
probably will make them be quantized into two dictionary 
words in the bag-of-word model. The difference is already 
this much for two images those are almost identical in con- 
tent, it is reasonable to infer that the difference could be 
larger for two images with different picture styles within 
one object class. Therefore, when images used during train- 
ing and testing don't have similar picture styles, the accu- 
racy of the object recognition will drop. Among the pre- 
vious work, only Domain Adaptation (DA) considered one 
situation ifTSl where part of the images are taken by a Dig- 
ital SLR and the rest are taken by a webcam under similar 
conditions (e.g. (c) and (d) in Fig. [T]). And they assume 
images used in training are taken by different device from 
the images used in testing. 

Although the DA gave some solutions to this special 
case by considering the two sets of images as two domains. 



in their algorithms the domain- ship of an image has to be 
specified. However, in the real world applications, images 
collected from Internet have no "domain labels", and the 
training /testing sets are always mixtures of all kinds of im- 
ages with various picture styles. Furthermore, much more 
picture styles are created by users (e.g. Instagram users or 
iphone camera app users) besides the ones caused by differ- 
ent camera responses. Therefore, in a more general assump- 
tion than DA, to find a robust object recognition algorithm 
becomes useful and challenging, which should overcome 
the difficulties introduced by different picture styles with- 
out knowing the style information. 

In this paper, we study this general problem with a fo- 
cus on descriptor design. Existing approaches usually ig- 
nore the changes of picture styles when computing the stan- 
dard descriptors, then they try to reduce the influences of 
style changes through operations in the corresponding fea- 
ture space. Such indirect methods are limited by the feature 
space and always require the style information for each of 
the images (e.g. the domain- ship in DA research). In this 
paper, we take the direct way. Suppose picture styles of 
all the images in a data set form a point in a certain space, 
then the original data set is corresponding to a point A in 
this space. We define a function ^ : [0, 255] [0, 255] 
which can be applied to all the images in the original data 
set and project point Aio B, where B corresponds to the 
new data set of all the processed images and can be denoted 
as 5 = g{A). Basically g{-) is a pixel-level editing function 
which works indifferently for all images. Since the picture 
styles of an image affect the object recognition performance 
through the descriptors, we assume there is an optimal g* 
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that maps A to where the set of training and testing 
images gives best recognition accuracy. The searching for 
g* could be difficult since there is no clear connections be- 
tween a general function g with the empirical risk of the 
classifier used in object recognition. However, by defining 
g based on a convex combination of several base functions, 
in this paper, we link the pixel editing function with image 
descriptors and kernels. And we also propose an adaptive 
descriptor design based on kernel learning to achieve the 
equivalent objective. We derive the method based on kernel 
descriptors [21, but the ultimate algorithm can be extended 
to existing standard descriptors as a framework in general. 
In the following, we discuss some related works in Section 
11.11 Then we revisit the kernel descriptors in Section |2] We 
present the proposed method in Section [3] and Experiments 
in SectiorlH 

1.1. Related Works 

Domain Adaptation is probably the most related area to 
our problem. In the data set introduced in [flSl , images from 
dslr and webcam mainly differ in picture styles, which is 
similar to the focus of this paper. Metric learning based 
methods II121I18II and Grassmann manifold based methods 
ITJIHl were proposed. As we stated, these DA methods can- 
not solve our proposed problem in general situations, since 
the domain- ship is unknown and hard to specify for image 
in the wild. Works like |9 , 1 1 , 25] estimate the model of im- 
age pipelines, but such estimations are difficult to get and 
has no clear relationship with the descriptors and recogni- 
tion accuracy. In the area of key point matching, several 
robust descriptors are proposed, such as DAISY [|20l . GIH 
(H andDaLI fWi. Descriptor learning methods [I19I23I24I 
are also invented to determine the parameters in descriptor 
computation by optimization. All of these methods are de- 
signed for key point matching between image pairs. The 
different goal leads to descriptors that are not suitable for 
object recognition, since they are too discriminative to tol- 
erant the with-in class variance for object categories. 

2. Kernel Descriptor Revisit 

The kernel descriptor (KDES) is proposed by Bo et. al. 
in O, which gives a unified framework and parametric form 
for local image descriptors. Let z denote a pixel at coor- 
dinate z, m{z) denote the magnitude of image gradient at 
pixel z, and 0{z) denote the orientation of image gradient. 
And m{z) and 0{z) are the normalized by the average val- 
ues of one patch contains z into rh{z) and 0{z). According 
to this kernel view, the gradient matching kernel between 
two image patches P and Q can be described as 



where kp{z^z') = exp{—'^p\\z — z'\\^) is a Gaussian 
position kernel and ko{0{z)^0{z')) — exp(— 7o| |^(z) — 
^(z')lP is a Gaussian kernel over gradient orientations. 
And m{^z) = m{^z) / y^X^^gp ^(^)^ + ^g^ where Cg is 
a small number. Orientation is normalized as ^(z) = 
[sin{0{z))cos{6{z))]. To build compact feature vectors 
from these kernels for efficient computation, ||2l presented a 
sufficient finite-dimensional approximation to obtain finite- 
dimensioned feature vectors and reduce the dimension by 
kernel principal component analysis, which provides a close 
form for the descriptor vector Fgrad{P) of patch P such 

thSit kgrad{P.Q) = Fgrad{PfFgrad{Q)- And Bo Ct. al. 

im also showed that gradient based descriptor like SIFT 
(TSl, SURF 1 1 1, and HoG [4] are special cases under this 
kernel view framework. 

For the image-level descriptors. Bo and Sminchisescu 
||3l presented Efficient Match Kernels (EMK) which pro- 
vided a general kernel view of matching between two im- 
ages as two sets of local descriptors. And they demon- 
strated that Bag-of-Word (BoW) model and Spatial Pyra- 
mid Matching are two special cases under this framework. 
Let X and Y denote two sets of local descriptors for image 
Ix and ly respectively, x e X is 3. descriptor vector com- 
puted from patch Px in image Ix. When applying EMK on 
top of gradient KDES, we get the image level kernel as 

Kemk{Ix^Iy) = |x||y| ^xeX ^yeY ^(^^ V) 

~ \X\\Y\ ^xeX^yeY ^gradiPxj Py)j 

(2) 

where | • | is the cardinality of a set. ISJ gave a close 
form compact approximation of the feature vector such that 
Kemk{Ixj ly) = ^{Ix)^^{^y), which makcs the matching 
kernel can be used in real applications with efficient com- 
putation and storage. 

3. Proposed Method 

3.1. Kernel Descriptor with Editing Functions 

As stated in Introduction, we want to apply a pixel edit- 
ing function g to images used for object recognition. In 
this section, we will give the relationship between pixels 
and descriptors under this function g. Take g{u{z)) = 
aiu{z) + a2u{z)'^ as an example, where ai,a2 are non- 
negative, z is a pixel from image patch P, and u{z) is the 
pixel value at position z. Let g{P) denote the new patch af- 
ter applying g on the pixels of P. Then the image gradient 
at z now becomes 

Vg{u{z)) = g'\uiz)^u{z) 

= (ai ^2a2u{z))Vu{z), ^ ^ 



kgrad{P, Q) = ^Y1 ^(^)^(^O^o(^(^), 0{z'))kp{z, z'), whcrc ai + 2a2u{z) is a scalar and Vu{z) is a vector. Let 
zeP z'eQ ^g{^) ( ^ ) t>e the magnitude and orientation of gradi- 

(1) ent at 2: of patch g{P), and m{z) and 9{z) be corresponding 
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values to patch P. Under the assumption that ai, a2 > 0, 
we have 

mg{z) = \\Vg{u{z))\\ 

= \\{ai^2a2u{z)pu{z)\\ (4) 
= ai\\Vu{z)\\ -^a2\\Vu{z)^\\ 

It is clear that Og{z) = 0{z), therefore 0{z) = Og{z), which 
means the orientation is invariant to the pixel editing func- 
tions applied to the image patch. Notice that the magnitude 
used in gradient match kernel are normalized base on local 
patches, which is very important to make the contextual in- 
formation comparable for different patches. Let m2{z) de- 
note ||V2i(z)^|| for convenience. To retain the simple con- 
vex combination form of mg{z), we propose a new locally 
normalization 

rhg{z) = aim{z) + a2rh2{z), (5) 

where rhg{z) denotes the new normalized magnitude of 
mg{z), and rh{z) and rfi2{z) are normalized by I2 norm 
as mentioned in Section [2l It is clear that the rhg{z) is 
also locally normalized and still comparable among differ- 
ent patches. Since the goal of our method is object recogni- 
tion, any proper locally normalization method is acceptable. 

Now given two image patches g{P) and g{Q), which 
are obtained by applying editing function g to patches P 
and Q, we derive the gradient match kernel between them 
as following 

= T.zeg{P) T^z'egiQ) rhg{z)rhg{z')ko{0 g{z) , Og{z'))kp{z, z') 
= HzeP E2'gq(^i^(^) + a2m2{z)){aim{z') + a2m2{z')) 
ko{0{z),0{z'))kp{z,z') 

= aiai Y^zeP T^z'eQ rn{z)m{z')ko{e{z), 0{z'))kp{z, z') 
+ttitt2 Y^zeP ^z'eQ rri{z)m2{z')ko{9{z), e{z'))kp{z, z') 
^a2ai Y^zeP ^z'eQ rh2{z)m{z)ko{0{z), e{z'))kp{z, z') 
+^2^2 Y.zeP T^z'eQ rn2{z)m{z')ko{0{z), 0{z'))kp{z, z') 
= aiaikgrad{P, Q) + CLia2kgrad{P, Q^) 
-^a2aikgrad{P'^ , Q) + CL2a2kgrad{P'^ , Q^), 

(6) 

where and denote the patches contain squared pixel 
values from P and Q. And it is worth noting that kgrad 
above is different from the standard kgrad in Eq. ([T]), since 
we define a different normalization approach in Eq. (|5j. 
Let's rewrite g as g{u{z)) = aigi{u{z)) + Ci2g2{u{z)), 
where gi{u{z)) = u{z) and g2{u{z)) = u{z)'^. Therefore 
Eq. (O indicates 

2 2 

didjkgrad 

{g^{P).gm)• 
(7) 

As stated in Introduction, we successfully link the pixel 
editing functions with image descriptors. To see this con- 
nection in image-level, we plug Eq. (|7]) into Eq. (O and get 




Figure 3. Plots of proposed base functions, 
the image-level kernel 

Kemk{9{Ix)^g{Iy) = JxJ\Y\ T^xeX T^yeY krad{g{Px) ^ g{Py) 
^ \X\jY\ ^xeX ^yeY ^i^j^grad{gi{Px)^ gj{Py)) 

— \X\\Y\ ^xeX ^yeY ^grad{gi{Px) ^ gj {Py))) 

= ELi Ej=i (^i(^jK{gi{Ix),gj{Iy)) 

~ Em=l ^rnKmj 

(8) 

where dm and have one-to-one correspondence to a^a^ 
and K{gi{Ix)^ gj{Iy)), and the order does not matter since 
they are exchangeable in the summation. We limit to be 
non-negative when we first defined g, then (i^'s are also 
non-negative. In addition, i^rn's are positive definite(PD) 
kernels, which makes Kemk here a convex combination of 
PD kernels and can be used in standard multiple kernel 
learning. Therefore, we successfully transfer the search- 
ing of optimal g* into learning the optimal kernel weights 
through Eq. ([8]). In general, for based editing functions, 
there will be N'^ base kernels in Eq. ([S]). 

3.2. Base Editing Functions 

From Section 13.11 we know that the selection of base 
functions is as important as learning the parameters. By 
exploring the editing functions used for photography, we 
found Gamma correction and the "S" curve are two ma- 
jor categories of photo effects. Gamma correction can 
brighten(7 < 1) or darken (7 > 1) the images, and the 
"S" curve can increase the contrast. Although these popular 
functions are created for the visually pleasure of photos, we 
believe that they can also benefit the computation of bet- 
ter descriptors. For example, brightening can bring back 
more details in the dark part of an image; Darkening can 
surpass the irrelevant areas of an object image, since most 
of the images are correctly exposed for the centering object; 
Higher contrast emphasizes the texture and shapes. There- 
fore, these three types of functions make good candidates 
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for base functions. However, a power-law function used 
in Gamma correction contains a free parameter that will be 
left in the gradient and cannot be combined using simple ad- 
dition, then a good approximation is desired. And the "S" 
curve doesn't have a standard formulation, we adopt the sig- 
moid function for our algorithm. The three based functions 
that we are proposing here are: 

gi{x) = 0.3 * (log(2x + 0.1) + I log(O.l)l) (9) 

g2{x)=0.8x^ (10) 

where x takes a value from [0, 1] as a scaled image pixel. 
We show the plots of these functions in Fig. [3] From the 
plots we can see that gi could serves as a brightening func- 
tion, similar to gamma correction when 7 < 1. ^2 has a 
shape like gamma correction when 7 = 2, which could be 
used as darkening. And gs has a "S" shape that increase the 
contrast by brightening the brights and darkening the darks. 
The effects of these functions can be seen from Fig HI 

3.3. Learning of the Parameters 

After defining the base functions, the next question is 
how to estimate the weighting coefficients for object recog- 
nition. According to Eq. ©, the image-level kernel can be 
decomposed as a convex combination of several base ker- 
nels. We adopt General Multiple Kernel Learning (GMKL) 
II2TII and put non-negative constraints on the kernel coeffi- 
cients. We also notice that some weighting coefficients are 
same (e.g. aia2 = a2<^i), but the experiments show that 
the results are similar with or without this constraint on the 
weights. And the number of kernels in our algorithm is not 
large, therefore, we use the standard GMKL with 12 norm 
regularization. 

3.4. Adaptive Descriptor Design 

In this section, we summarize the proposed adaptive de- 
scriptor design in the following 



1. Process image / from the data set with {gi}^^^. 

2. Compute gradient-based descriptors for / and its 3 
variants to get 4 descriptors. 

3. Build a codebook using Kmeans by sampling from all 
the training images and all 4 descriptors of each. 

4. Quantize each image from training and testing sets into 
4 image-level feature vectors based on the descriptors. 

5. For two images, compute linear kernels between any 
two of their 4 image-level features to get 16 base ker- 
nels. 

6. Train GMKL on 16 base kernels to get optimal kernel 
weights and classifiers. 

Our proposed method does not require prior knowledge 
on picture styles of training or testing images, and the Adap- 
tive Descriptor Design (ADD) can work as a general frame- 
work with flexibility. In step 1 , other proper functions can 
be used here as base editing functions, besides the ones 
we used here. According to the analysis by [2], most of 
the gradient-based descriptors, such as SIFT L15L SURF 
fll and HoG (H, are special cases of the kernel descrip- 
tor(KDES), which all can be used in step 2 to compute de- 
scriptors from image patches. In addition, the quantization 
method used in step 4 can be chose from Bag-of-word, Spa- 
tial Pyramid Matching and Efficient Match Kernel, since the 
former two are special cases of EMK. In other words, our 
proposed algorithm can be used widely to improve the pre- 
vious methods which are based on gradient descriptors and 
SVMs. 

We also want to point out that the proposed ADD is a 
single feature method, although a GMKL is used for es- 
timating the coefficients. Essentially, the final kernel ob- 
tained through GMKL is equivalent to a single kernel based 
on standard descriptors of images processed by optimal 
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Source 


Target 


standard KDES 


ADD_AK 


ADD.GMKL 


dslr 

webcam 

amazon 


webcam 

dslr 

dslr 


49.30 ± 1.26 
46.67 ± 0.80 
47.43 ± 2.79 


50.28 ± 1.02 
48.17 ±0.99 
48.57 ±2.51 


54.81 ± 1.07 
50.33 ± 0.79 
53.90 ± 2.67 



Table 1. Experiments on DA data set based on KDES. The average 
accuracy in % is reported and the corresponding standard deviation 
is included. 



4. Experiments 

In this section, we describe the details of experiments 
and report the results for the proposed method to compare 
with standard gradient-based image descriptors. We con- 
duct object recognition on Domain Adaptation data set and 
Oxford Flower data set. We also process the images from 
Oxford Flower data set using several popular photo effects 
in Instasram 

4.1. Domain Adaptation 

Domain Adaptation data set was introduced by ifTSll , 
where images for the same categories of objects are from 
different sources (called domains): amazon, dslr and web- 
cam. As we stated in Introduction, the two domains dslr 
and webcam only differ in picture styles which are caused 
by image pipelines. Applying the proposed ADD algo- 
rithm, we adopt KDES + EMK and SURF+BoW two sets 
of features to demonstrate that ADD can work as a frame- 
work to improve the performance of gradient-based descrip- 
tor in general. We follow the experimental protocol used 
in |[T2l[T8l for semi- supervised domain adaptation. It is 
worth noting that we didn't use any domain- ship informa- 
tion to specify the picture styles of images, our proposed 
method could figure out an optimal descriptor automatically 
based on the training set. 

4.1.1 ADD based on KDES and EMK 

We extract KDES descriptors of all the images in three do- 
mains and create a 1,500- word codebook by applying K- 
means clustering on a subset of all 4 types (original + 3 
variants for each images) of descriptors from amazon do- 
main. And then this codebook is used to quantize 4 types 
of descriptors of all 3 domains of images using EMK. Af- 
ter obtaining the 16 linear kernels by computing the inner 
product of every two types of descriptors between two given 
images, we conduct object recognition experiments using 
SVMs for: the standard KDES, averaging kernel of this 16 
kernels (AK), and GMKL based on 16 kernels. We show 
the results in Table[T] from which we can see that the pro- 
posed Adaptive Descriptor Design outperforms the standard 
KDES in all cases for both averaging kernel and an optimal 

^We use Adobe Photoshop™ action files created by Daniel Box, which 
can give similar effects as Instagram^^ . 



Source \ Target \\ standard SURF | ADD_AK | ADD.GMKL 

'dslr I webcam 11 37.05 ± 1.72 I 41.61 ± 1.05 I 42.00 ± 1.16 

webcam dslr 30.09 ± 0.81 36.57 ± 0.75 36.45 ± 0.49 

amazon dslr 34.49 ± 1.30 40.62 ± 1.59 36.19 ± 2.04 



Table 2. Experiments on DA data set based on SURF. The average 
accuracy in % is reported and the corresponding standard deviation 
is included. 

kernel learned by GMKL. Particularly, the ADD.GMKL 
method improved from the standard KDES by all most 6% 
in all cases, which is close to the improvements obtained by 
domain adaptation methods ||7l|T0l[T2l|T8l, where domain- 
ship information is used. In addition, we would like to 
point out that our proposed ADD is actually a single fea- 
ture method since the learned kernel weights can be seen 
as the combination coefficients for the optimal pixel editing 
function and the whole process is equivalent to using a 
single KDES descriptor abstracted from ^* (/) for a given 
image /. 

4.1.2 ADD based on SURF and BoW 

To show the generalization ability of ADD, we follow pre- 
vious methods ||7l[lO||T2l[l8l to extract standard SURF de- 
scriptors from the original and 3 variants of each image, 
then a 800-word codebook is created from amazon domain. 
All the images in 3 domains are quantized by this codebook 
using Vector-quantization to get Bag-of-Word features. Af- 
ter obtain 16 linear kernels, we also conduct experiments 
using standard KDES, averaging kernel, and an optimal 
kernel learned by GMKL. We report the results in Table 
O The proposed ADD methods also outperform the stan- 
dard SURF descriptor in all cases. However, in this experi- 
ment, the averaging kernel gives better results than GMKL 
learned kernel in some of the scenario. We attribute the 
worse performance of GMKL based ADD to the lack of 
training, since the SURF descriptors are sparsely extracted 
from images and only 11(8 from source domain and 3 from 
target domain) training images per category are used. But 
the results for ADD_AK and ADD.GMKL are sufficient to 
show that the proposed Adaptive Descriptor Design can be 
applied on top of gradient-based descriptors widely, for dif- 
ferent tasks. 

4.2. Object Recognition on Oxford Flower with 
Photo Effects 

To simulate the images used in real world applications, 
which are taken by different devices and under through var- 
ious of pixel-level editing, we process the image from Ox- 
ford Flower data set with 3 effect filters that are popularly 
used in Instagram™: lomo-fi, lord-kelvin, and Nashville. 
Along with the original images, we obtain a image data 
set of 4 effects. Since the original flower images are col- 
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Figure 5. From left to right: original image, lomo-fi, lord-kelvin, m\d Nashville. 



lected from many difference sources, the original images 
were taken by different devices under different conditions, 
then the affects of scene radiance and image pipelines are 
already taken in to consideration. We show an example im- 
age and its 3 variants in Fig. [5] 

Since KDES gives a general framework for gradient- 
based descriptors, we only use KDES in the experiments 
in this section. For convenience of expression, we refer the 
data sets obtained by applying effect filters as picture styles. 
Similar as in Section 14.1.11 we extract KDES for all the im- 
ages from all 4 styles {original, lomo-fi, lord-kelvin, and 
Nashville). We construct a 2,000- word codebook by sam- 
pling 4 types of descriptors from original style. Then all 
the images from 4 effects are quantized using EMK with 
this codebook. To simulate the real image collections as 
mixtures of images with different picture styles, one image 
is only selected once from one of the 4 styles to obtain an 
experimental data set. For example, the 4 images in Fig. 
[5] are actually variants of the same image, then only one 
of them will be selected for one experiment. We conduct 
experiments on different style combinations to demonstrate 
the robustness of the proposed ADD method. In particular, 
experiments on single style data set are conducted for the 
4 styles respectively. And the mixtures of the original and 
one of the other three styles are used. At last, a mixture 
of all 4 styles are generated. After constructing the exper- 
imental data sets uniformly, we randomly split the images 
as 40 per category for training and the rest for testing. Av- 
eragely speaking, equal numbers of images from different 
styles will appear in training and testing. Different from 
domain adaptation, images used here in training or testing 
are not separated by domain-ships, which is more similar 
to the real- world applications where there is no prior infor- 
mation about domain- ship or picture styles of the images. 
We perform object recognition using SVMs for the standard 
KDES, averaging kernel, and learned kernel by GMKL. 
We report the results for 10 runs of experimental data set 
generation and training/testing spliting for each scenario in 
Tables 

From Tablets we can clearly see that the proposed 
ADD_GMKL method is superior than the standard KDES 
in all cases. From the top 4 rows of Table|3] we found that 
the recognition accuracies decrease when images are with 



style 1 


style2 


standard KDES 


ADD_AK 


ADD_GMKL 


original 


n/a 


69.35 ± 2.20 


67.76 ± 2.65 


74.32 ± 1.77 


original 


lomo-fi 


65.85 ± 1.66 


64.24 ± 1.88 


71.62 ± 1.04 


original 


lord-kelvin 


67.53 ± 1.32 


66.06 ± 1.80 


72.82 ± 0.69 


original 


Nashville 


66.44 ± 1.73 


64.62 ± 1.39 


71.88 ± 0.72 


lomo-fi 


n/a 


65.12 ± 1.48 


63.97 ± 1.62 


69.82 ± 0.70 


lord-kelvin 


n/a 


68.09± 1.38 


67.06 ± 1.40 


72.62 ± 1.38 


Nashville 


n/a 


67.03± 1.10 


66.85± 1.28 


71.68 ± 1.94 


all 


n/a 


64.56 ± 0.90 


63.24 ±0.58 


69.88 ±0.51 



Table 3. Experiments on Oxford Flower data set. The average 
accuracy in % is reported and the corresponding standard deviation 
is included. 



different picture styles, which confirms the motivation we 
described in Section [T] In addition, according to the sin- 
gle style experimental results, pixel-level editing functions 
can influence the recognition accuracy through the compu- 
tation of descriptors when the editing functions are applied 
to all the images used in training and testing, which supports 
the proposed idea that learning an optimal editing function 
^* can improve object recognition using gradient-based de- 
scriptors. In the last case, when images are uniformly sam- 
pled from all 4 styles, the standard KDES descriptor gives 
the worse performance, which is reasonable since the higher 
diversity of the images leads to larger differences between 
descriptors of similar image patches of the same objects. 

After demonstrating the problem we introduced in 
this paper widely exists, the improved performance of 
ADD_GMKL shows that our proposed algorithm is an ef- 
ficient solution. Recall that our ADD can be considered 
as a single feature method, the proposed ADD_GMKL out- 
performs the state-of-art [6| by 4% on the original Oxford 
Flower data set. Therefore, the Adaptive Descriptor Design 
can be used widely on top of gradient-based descriptors to 
further improve the recognition accuracy. 

We also notice that the ADD_AK method is not better 
than the standard KDES. We believe it is caused by the 
small size of the codebook (2,000 words). Since 4 types 
of descriptors are extracted from one image on a dense grid 
and there are 1360 images in total, this codebook introduced 
large distortion in quantization, which decreases the perfor- 
mance of averaging kernel. 
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5. Conclusion 

In this paper, we introduced a problem in object recog- 
nition caused by different picture styles of images. After 
showing the first study that links pixel-level editing func- 
tions with gradient-based image descriptors, we proposed 
an Adaptive Descriptor Design (ADD) to solve the prob- 
lem. We demonstrated that ADD can be widely used as 
a general framework based on popular descriptors, and the 
experimental results show the improvements of ADD on do- 
main adaptation data set, standard Oxford Flower data set 
and its variants with different picture styles. 
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